Automatically Increasing Fault Tolerance in Distributed Systems Automatically Increasing Fault Tolerance in Distributed Systems 2 System Model 7 3 Translations for Synchronous Systems: Deenitions and Limitations 17 6 Translations for Partially Synchronous Systems 81

نویسندگان

  • Rida Adnan Bazzi
  • Mustaque Ahamad
  • James Burns
  • Jim Burns
  • Ken Calvert
  • H. Venkateswaran
  • Hernan Astudillo
  • Ranjit John
  • Rimli Sengupta
  • Ivan Yanasak
چکیده

Date Approved by Chairman Acknowledgments The guidance and encouragement of my advisor Gil Neiger were invaluable. For that and for his understanding, I thank him. Also, I would like to thank the members of my committee Mustaque Ahamad, Jim Burns, Ken Calvert, and H. Venkateswaran for their help and feedback. Many fellow students made my stay at Georgia Tech more enjoyable. Especially, I would like to thank I would like to thank the Hariri Foundation for making graduate school possible. In particular, I would like to thank Marc Muething and David Thompson at the Hariri Foundation for their help. Most of all, I would like to thank my wife Lina for her love, support and encouragement, not to mention her proofreading parts of the thesis. Last, but not least, I thank my parents for everything they have done for me. iii Contents Acknowledgments iii Summary vii 1 Introduction 1 1. Bibliography 90 Vita 93 v Summary Developing fault-tolerant distributed protocols is a diicult task. The diiculty of this task increases with the severity of the failures to be tolerated. One way to deal with this diiculty is to develop protocols tolerant of benign failures and then transform these protocols into ones that are tolerant of more severe failures. This transformation mechanism is called a translation. This dissertation considers a variety of processor failures and synchrony models. The failures studied range from simple stopping failures to arbitrary faulty behavior. The syn-chrony models range from systems in which processors are fully synchronized (synchronous systems) to systems in which processors are not synchronized at all (asynchronous systems). For all synchrony models, the dissertation gives general deenitions of translations and of measures to evaluate their performance. The two measures considered are communication complexity and fault-tolerance. Communication complexity is the communication overhead incurred when using a translation. Fault-tolerance is the maximum proportion of processors that can be faulty without aaecting the correctness of the translations. For synchronous systems, this dissertation presents a complete study of the relationship between fault-tolerance and round complexity of translations. It develops new translations that are optimal and proves that some previously developed translations are optimal. For asynchronous systems, it proves that some previously developed translations are optimal. For systems that are only partially synchronous this dissertation discusses some of the issues involved in designing eecient translations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum-Process Synchronous Checkpointing in Mobile Distributed Systems

Checkpointing is an efficient fault tolerance technique used in distributed systems. Due to the emerging challenges of the mobile distributed system as low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life, the fault tolerance technique designed for distributed system can not directly implemented on mobile distributed systems(MDSs). This research pape...

متن کامل

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...

متن کامل

Real-Time Fault-Tolerant Atomic Broadcast

We present algorithms for Real-Time Fault-Tolerance Uniform Atomic Broadcast developed in the framework of the French project ATR (accord temps réel). We first design a distributed execution model for asynchronous systems with crash failure we called Synchronized Phase System (SPS), then we give an algorithm for Atomic Broadcast in SPS. In a SPS, the processes try to run in synchronized rounds ...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Influence of Fault Current Limiter in Voltage Drop and TRV Considering Wind Farm

Influence of distributed generation systems in the distribution systems can increase the level of short-circuit current. The effectiveness of distributed generation systems is affected by the size, location, type of distributed generation systems technology, and the methods of connecting to distribution systems. Wind turbine system is the examples of distributed generation source. Not only does...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994